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ABSTRACT 

With  modern  focus  on  LiDAR  technology  the  amount  of  topo¬ 
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dramatically.  Furthermore,  due  to  the  popularity  of  LiDAR,  re¬ 
peated  surveys  of  the  same  areas  are  becoming  more  common. 
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1  Introduction 

Modern  sensing  and  mapping  technologies  such  as  airborne  LiDAR 
technology  can  rapidly  map  the  earth’s  surface  at  15 -20cm  horizon¬ 
tal  resolution.  Since  LiDAR  provides  a  relatively  simple  and  cheap 
way  of  surveying  large  terrains,  some  areas  are  (or  will  be)  sur¬ 
veyed  multiple  times,  yielding  large  spatio-temporal  data  sets.  For 
example,  the  coastal  region  of  North  Carolina  has  been  mapped 
every  1-2  years  since  the  mid  1990’s[l].  Figure  1  shows  a  spatio- 
temporal  model  of  a  portion  of  this  region  constructed  from  the  data 
set.  Such  data  sets  provide  tremendous  opportunities  for  a  wide 
range  of  commercial,  scientific,  and  military  applications.  For  in¬ 
stance,  unmanned  aerial  vehicles  can  survey  an  area  over  a  certain 
time  period  and  the  resulting  spatio-temporal  topographic  data  can 
be  used  to  detect  the  location  of  new  buildings,  machinery,  or  veg¬ 
etation.  On  a  larger  scale,  they  can  be  used  to  offer  interesting  in¬ 
sights  into  understanding  terrain  dynamics.  For  example,  there  has 
been  much  interest  on  studying  the  movement  of  sand  dunes  and 
erosion  at  the  North  Carolina  coast  [18].  It  is  essential  for  many 
applications  to  exploit  the  high-resolution  data  sets  since  small  fea¬ 
tures  may  have  a  large  impact  on  the  output.  For  instance,  it  is  vital 
that  dikes  and  other  features  are  present  in  data  sets  used  for  hydro- 
logical  modeling,  but  these  features  are  often  relatively  small  and 
unlikely  to  disappear  if  the  resolution  is  low  [5,  19]. 

Capitalizing  on  the  opportunities  presented  by  high-resolution 
data  sets  and  transforming  massive  amount  of  spatio-temporal  to¬ 
pographic  data  into  useful  information  requires  that  several  algo¬ 
rithmic  challenges  be  addressed.  To  begin  with,  the  scattered  point 
set  S  generated  by  LiDAR  cannot  be  used  directly  by  many  GIS  al¬ 
gorithms.  They  instead  operate  on  a  digital  elevation  model  (DEM). 
Because  of  its  simplicity  and  efficiency,  the  most  widely  used  DEM 
is  a  2D  uniform  grid  in  which  an  elevation  value  is  stored  at  each 
cell.  For  spatio-temporal  data,  this  translates  into  a  3D  grid  with 
time  representing  the  third  dimension.  One  has  to  extend  the  ele¬ 
vation  measured  by  LiDAR  at  the  points  of  S  via  interpolation  to 
a  uniform  grid  Q  c  R3  of  the  desired  resolution;  the  interpolation 
is  performed  in  both  time  and  space.  We  note  that  the  need  to  in¬ 
terpolate  a  spatio-temporal  data  set  on  a  uniform  grid  arises  in  a 
variety  of  applications.  For  example,  one  may  want  to  interpolate 
data  generated  by  a  sensor  network  deployed  over  a  wide  area  [9]. 

There  is  extensive  work  on  statistical  modeling  of  spatio-temporal 
data  in  many  disciplines,  including  geostatistics,  environmental  sci¬ 
ences,  GIS,  atmospheric  science,  and  biomedical  engineering.  It 
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(a)  Interpolated  DEM  in  2002  (b)  Interpolated  DEM  in  2005  (c)  Aerial  photo  of  the  dune 

Figure  1.  Jockey’s  Ridge  State  Park  sand  dune  in  Nags  Head,  NC,  USA. 


is  beyond  the  scope  of  this  paper  to  discuss  these  methods  here. 
We  refer  to  [12,  14,  24]  for  reviews  of  many  such  results.  In  the 
context  of  GIS,  interpolation  methods  based  on  kriging,  inverse 
distance  weighting,  shape  functions,  random  Markov  fields,  and 
splines  have  been  proposed;  see  [17,  22,  13,  15]  and  references 
therein.  Although  sophisticated  methods  such  as  kriging  and  RST 
(spline  based  method  [17])  produce  high  quality  output,  especially 
when  data  is  sparse,  they  are  computationally  expensive  and  not 
scalable.  On  the  other  hand,  simple  methods  such  as  constructing 
triangulation  on  input  points  in  R3  [13]  and  linearly  interpolating 
the  elevation  on  grid  points,  though  efficient,  generate  artifacts  in 
the  output  and  are  not  suitable  for  many  GIS  applications. 

In  this  paper  we  use  the  well-known  natural  neighbor  interpo¬ 
lation  (NNI)  strategy  [23].  Given  a  finite  set  S  of  points  in  Rd,  a 
height  function  /*  :  S'  — >  R  can  be  extended  to  the  entire  Rd  using 
natural  neighbor  interpolation.  In  particular,  for  a  point  q  e  Rd, 

h(q)  =  E  Wp(q)h(p\  (1) 

p&S 


where  wp(q)  e  [0, 1]  is  the  fractional  volume  of  Vor^u^^)  that 
belongs  to  Vor s(p)  (see  Figure  2  for  a  2D  example),  i.e., 


wp(q)  = 


Vol(Vors  (p)  n  Vorsu^O?)) 
Vol(Vor,u{,}(^)) 


(2) 


where  VorA(z)  denotes  the  Voronoi  cell  of  the  point  z  G  Ain  the 
Voronoi  diagram  of  A.  See  Section  3  for  a  formal  definition  of  the 
Voronoi  diagram.  NNI  produces  a  smoother  surface  than  linear  in¬ 
terpolation  and  readily  extends  to  spatio-temporal  data.  However, 
computing  NNI  on  grids  in  3D  is  challenging  and  expensive.  First, 
the  size  of  the  Voronoi  diagram  in  3D  can  be  quadratic  in  the  worst 
case  even  for  the  Euclidean  metric  [6].  As  mentioned  below,  for 
spatio-temporal  data,  we  may  have  to  compute  the  Voronoi  dia¬ 
gram  under  more  general  metrics.  Even  if  the  size  of  the  Voronoi 
diagram  is  near  linear  for  realistic  data  sets  [3],  computing  it  is  still 
expensive,  especially  on  large  data  sets  that  do  not  fit  in  main  mem¬ 
ory.  Since  S  can  not  be  assumed  to  be  in  general  position,  robust 
implementations  must  take  great  care  to  handle  cases  such  as  point 
duplicates  and  groups  of  multiple  points  on  the  same  plane/sphere; 
the  existing  implementations  of  Voronoi  diagram  construction  such 
as  Qhull  are  slow  on  degenerate  inputs.  Furthermore,  performing 
NNI  involves  computing  the  intersection  of  two  polyhedra  and  the 
volume  of  this  intersection.  Hemsley  [10]  has  implemented  NNI 
in  3D  under  the  Euclidean  metric,  but  the  implementation  is  not 
scalable;  see  Section  5  for  more  details.  We  are  unaware  of  any  ro¬ 
bust  implementation  of  NNI  for  points  in  R3  that  can  handle  large 
data  sets.  Because  of  these  challenges,  we  propose  to  discretize  the 
Voronoi  diagram  and  compute  NNI  approximately;  the  approxima¬ 
tion  error  can  be  controlled  by  choosing  discretization  parameters 


appropriately. 

To  attain  significant  speed-up  in  the  NNI  computation,  we  ex¬ 
ploit  the  graphics  processing  units  (GPUs)  available  on  modern 
PCs.  Although  originally  designed  for  quickly  rendering  3D  ge¬ 
ometric  scenes  on  an  image  plane  (screen)  and  extensively  used  in 
video  games,  they  can  be  regarded  as  massively  parallel  vector  pro¬ 
cessors  suitable  for  general  purpose  computing.  Known  as  general 
purpose  GPUs  (GPGPUs),  their  tremendous  computational  power 
and  memory  bandwidth  make  them  attractive  for  applications  far 
beyond  the  original  goal  of  rendering  complex  3D  scenes.  As  GPUs 
have  become  more  flexible  and  programmable  (e.g.  NVIDIA’ s 
CUDA  [20]  library)  they  have  been  used  for  a  wide  range  of  ap¬ 
plications,  e.g.,  geometric  computing,  robotic  collision  detection, 
database  systems,  fluid  dynamics,  and  solving  sparse  linear  sys¬ 
tems;  see  [21]  for  a  recent  survey.  In  the  context  of  grid  DEM 
construction,  Fan  et  al.  [8]  and  Beutel  et  al.  [5]  have  described  a 
GPU  based  algorithm  for  (NNI)  in  2D. 

Our  contributions.  Let  S  c  R3  be  a  set  of  LiDAR  points  on  which 
elevation  is  measured,  and  let  Q  be  a  3D  grid  of  points  at  which  we 
want  to  compute  the  elevation.  We  present  a  simple  yet  very  fast 
GPU  based  algorithm,  called  TerraNNI  for  performing  a  variant  of 
NNI,  which  is  more  suitable  for  our  application,  on  the  points  of 
Q.  Our  algorithm  is  a  generalization  of  the  algorithm  described 
in  [5],  but  several  new  ideas  are  needed  in  3D.  LiDAR  scanners 
provide  dense  point  cloud  of  elevation  data  at  most  locations,  but 
there  are  gaps.  In  space,  these  gaps  usually  appear  at  large  bodies 
of  water  or  human-made  objects  that  have  been  removed  from  the 
point  cloud  in  a  preprocessing  step.  In  time,  the  gaps  appear  when 
surveys  fail  to  cover  exactly  the  same  region  as  previous  surveys. 
When  these  gaps  are  large  (i.e.,  there  is  a  large  region  in  space  and 
time  with  no  data)  we  wish  to  label  the  corresponding  gap  cells  in 
the  volumetric  grid  with  nodata  instead  of  interpolating  elevation 
based  on  points  that  are  very  far  away.  We  extend  the  notion  of  the 
region  of  influence  of  an  input  point  as  introduced  in  [5],  to  include 
time.  We  compute  the  elevation  at  a  grid  point  using  only  those 
input  points  that  lie  within  its  region  of  influence.  Our  algorithm 
allows  the  “radius”  of  region  of  influence  to  be  arbitrarily  large.  If 
it  is  set  sufficiently  large  then  the  algorithm  computes  the  standard 
NNI. 

Since  we  are  working  with  spatio-temporal  data,  Euclidean  dis¬ 
tance  may  not  necessarily  be  the  appropriate  function  to  measure 
the  distance  between  two  points.  We  therefore  assume  that  we  have 
a  metric  d(-,  •)  and  compute  Voronoi  diagrams  under  this  metric. 
While  computing  Voronoi  diagrams  on  the  CPU  for  general  met¬ 
rics  is  quite  hard,  we  show  that  it  is  relatively  straightforward  on  a 
GPU.  We  also  show  how  the  computation  of  Voronoi  diagrams  can 
be  optimized  for  “weighted”  Euclidean  distance. 

The  main  difficulty  in  performing  NNI  on  the  points  of  the  3D 
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Figure  2.  Natural  neighbor  interpolation  for  a  point  set  S  and  query  point 
q  in  R2.  The  shaded  cell  is  Vor5U{?}(g),  and  each  color  denotes  the  area 
stolen  from  each  cell  of  Vor (S). 


grid  Q  is  that  GPUs  support  only  two-dimensional  buffers,  so  un¬ 
like  in  [5],  we  cannot  perform  NNI  on  all  grid  points  in  one  pass. 
Instead,  we  handle  one  2D  slice  of  Q  at  a  time.  This  involves  pro¬ 
cessing  each  input  point  several  times.  We  present  three  algorithms, 
which  provide  trade  offs  between  the  number  of  times  each  point 
is  processed  and  the  space  (on  the  GPU)  used  by  the  algorithm. 
The  first  algorithm  stores  only  one  copy  of  the  GPU  frame  buffer 
but  processes  each  point  0(r2)  times,  where  r  is  the  “radius”  of  in¬ 
fluence  along  the  time  axis,  see  Section  3  for  a  precise  definition 
(here  we  assume  that  the  temporal  resolution  of  Q  is  1).  The  sec¬ 
ond  algorithm  reduces  the  number  of  passes  over  the  input  points 
to  0(r )  by  storing  r  copies  of  the  GPU  frame  buffer.  Finally,  if  the 
input  data  is  time-series  data,  i.e.,  S  =  E1, . . .  ,Er  for  \T\  |S|, 

where  all  points  in  E*  have  the  same  time  coordinate,  and  if  the 
distance  d(-,  ■)  is  a  weighted  Euclidean  metric  (see  Section  3  for 
the  definition),  we  describe  an  algorithm  that  makes  only  one  pass 
through  the  input  points,  under  the  assumption  that  some  primitive 
operations  on  the  GPU  buffers  can  be  performed;  see  Section  4  for 
details.  The  algorithm  extends  to  certain  other  metrics  as  well,  but 
we  omit  this  extension  from  this  version  of  the  paper.  We  refer  to 
Table  1  in  Section  4  for  an  overview  of  the  relative  performance  of 
our  algorithms. 

Our  experimental  results  demonstrate  that  TerraNNI  is  both  scal¬ 
able  and  efficient.  It  interpolates  over  450  million  input  points  span¬ 
ning  2  years  (8.8GB  on  disk)  in  26  minutes.  We  also  test  the  mem¬ 
ory  trade-offs  on  a  smaller  data  set  consisting  of  about  20  million 
data  points  spanning  1 1  years.  The  simplest  algorithm  processing 
each  point  0(r2)  times  is  2-2.2  times  slower  than  the  algorithm  that 
only  processes  each  point  0(r )  times  at  a  cost  of  using  r  frame 
buffers.  The  latter  took  six  and  a  half  minutes  to  interpolate  at 
48  million  points.  We  also  tested  the  Interpolate3d  [10]  package 
which  is  an  implementation  of  NNI  on  the  CPU.  Interpolate3d  was 
unable  to  handle  the  full  20  million  point  data  set  with  the  8GB 
of  memory  available  in  the  machine,  but  was  able  to  interpolate  a 
10  million  point  subset  in  about  an  hour  and  a  half.  We  also  com¬ 
pared  TerraNNI  against  Qhull  [4]  and  Cgal  [2]  which  both  contain 
state  of  the  art  Delaunay  triangulation  implementations,  which  are 
equivalent  to  the  first  step  of  performing  NNI.  It  took  Cgal  eleven 
minutes  to  create  the  Delaunay  triangulation  of  the  20  million  input 
points,  but  it  crashed  when  attempting  to  triangulate  the  larger  450 
million  point  data  set.  Compare  the  eleven  minutes  for  the  smaller 


data  set  with  the  six  and  a  half  minute  used  by  TerraNNI  to  compute 
the  (discretized)  Voronoi  diagram  and  perform  the  interpolation  at 
48  million  points.  Out  of  those  six  and  a  half  minutes,  only  50  sec¬ 
onds  where  spent  on  loading  the  data  and  generating  the  Voronoi 
diagram.  We  refer  to  Section  5  for  details. 

Finally,  we  apply  the  grid  DEM  constructed  by  TerraNNI  to  an¬ 
alyze  the  terrain  dynamics.  In  particular,  we  interpolate  a  multi¬ 
temporal  LiDAR  of  Nags  Head,  NC  and  study  the  movements  of 
sand  dunes  in  the  coastal  region. 

Overview  of  the  paper.  This  paper  is  organized  as  follows.  Sec¬ 
tion  2  gives  a  short  introduction  to  the  fundamentals  of  the  GPU 
model  of  computation,  and  Section  3  presents  an  algorithm  for 
computing  Voronoi  diagrams  in  3D.  Section  4  presents  the  differ¬ 
ent  algorithms  for  interpolating  on  volumetric  3D  grids,  and  Sec¬ 
tion  5  compares  the  results  from  these  different  algorithms.  Finally, 
we  present  the  application  on  the  NC  coast  data  in  Section  6  and 
present  a  short  conclusion  in  Section  7. 

2  GPU  Model  of  Computation 

In  this  section  we  give  a  brief  overview  of  the  parts  of  the  model  of 
computation  offered  by  GPUs  that  are  relevant  for  our  paper. 

The  graphics  pipeline.  The  graphics  pipeline  is  responsible  for 
drawing  3D  scenes,  composed  of  many  objects,  onto  a  2D  image 
plane  of  pixels  as  seen  from  a  viewpoint  o.  Because  of  their  sim¬ 
plicity  and  flexibility,  these  objects  are  almost  always  triangles.  For 
each  pixel  n  -  (x,y)  where  x,y  is  a  global  coordinate,  the  GPU 
finds  all  objects  Q  =  {coi,co2, . . .  con}  which  ray  on  intersects.  To 
store  the  information  about  the  scene,  the  GPU  keeps  two  dimen¬ 
sional  arrays  of  pixels  called  buffers: 

•  The  depth  buffer  D  stores  the  distance  to  the  nearest  ob¬ 
ject  from  o  for  each  pixel  n.  Given  that  pj  is  the  point  of 
intersection  for  ray  on  and  object  coj,  the  GPU  calculates 
B[/r]  =  mini<;<n  \\opj\\. 

•  The  color  buffer  C  stores  the  color  of  the  scene  as  viewed 
from  o.  If  for  each  object  cjj  e  D  we  have  a  color  Xj,  we 
define  a  blending  function  that  computes  the  color  at  each 
pixel:  C[n]  =  2i </<«  ajXj>  where  aj  e  [0, 1]  is  the  blending 
parameter  of  ojj.  Typically,  ay-  is  based  on  depth  buffer  so 
that  C  stores  the  color  of  the  foremost  object. 

During  graphical  computations,  the  color  and  depth  buffers  re¬ 
side  in  memory  on  the  graphics  card.  Objects  can  be  drawn  onto 
these  buffers  with  specific  APIs  such  as  OpenGL  or  Microsoft’s  Di¬ 
rectX.  In  our  computations,  there  are  cases  where  we  will  want  to 
save  and  reuse  buffers.  Through  the  use  of  OpenGL- specific  APIs, 
we  can  switch  the  buffer  being  used  for  drawing  between  multi¬ 
ple  buffers  stored  in  GPU  memory.  However,  in  some  cases  we 
will  want  to  use  the  values  in  these  buffers  for  computation  on  the 
CPU.  For  this,  we  have  to  read  the  buffer  back  to  the  computer’s 
main  memory.  Unfortunately,  since  this  involves  transferring  large 
amounts  of  data  over  the  relatively  slow  bus  systems,  read  backs 
are  very  slow.  For  using  the  GPUs  parallel  processing  capabili¬ 
ties,  popular  graphics  card  manufacturer  NVIDIA  has  created  the 
CUDA  parallel  computing  architecture  [20],  which  facilitates  gen¬ 
eral  purpose  parallel  programming  on  the  GPU. 

3  Pixelized  Voronoi  Diagram 

Let  S  =  {pi, . . . ,  p„ )  be  a  set  of  points  in  P 3 .  We  view  R3  to  be 
xyt- space  —  x,y  being  spatial  dimension  and  t  being  the  time  axis; 
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each  point  p  =  (xp,yp,  tp)  is  located  in  a  2-dimensional  region,  with 
(xp,yp)  as  its  spatial  coordinates,  and  has  a  time  value  tp  associated 
with  it.  We  thus  view  S  as  spatio-temporal  data.  For  convenience, 
we  set  pi  =  ( Xi,yi,ti ).  For  a  point  p  e  R3,  we  use  p*  =  (xp,yp)  to 
denote  its  projection  on  the  xy-plane.  We  also  assume  that  we  have 
a  metric  d(-,  •)•  Since  we  are  dealing  with  spatio-temporal  data,  the 
Euclidean  metric  may  not  necessarily  be  the  appropriate  function 
to  measure  the  distance  between  two  points.  For  example,  we  may 
want  to  scale  the  Faxis  differently  from  the  x,  y-axis  and  define 

d(p ,  q )  =  {(xp  -  xqf  +  (yp-  yqf  +  oJ2(tp  -  tq)2 )'/2  (3) 

for  some  parameter  co  e  R.  We  refer  to  this  function  as  the  weighted 
Euclidian  metric. 

Pixelized  Voronoi  diagram.  For  a  point  pt  e  S ,  we  define  its 
Voronoi  cell  Vor5  {pi)  to  be 

Vors  (pi)  =  {z  e  R3  |  d(z,  pd  <  d(z,  pj)  1  <  j  <  n}. 

Vor (S )  is  the  subdivision  of  R3  induced  by  the  Voronoi  cells  of 
points  in  S . 

Let  ®  be  an  axis-aligned  box  in  R3  whose  projection  on  the  xy- 
plane  is  a  square.  We  are  interested  in  computing  a  discretized 
version  of  Vor(S)  inside  B,  defined  below.  We  discretize  B  into  a 
NsxNsx  Nt  uniform  3D  grid  of  voxels;  Ns  is  the  spatial  resolution 
of  the  grid  and  Nt  is  its  temporal  resolution.  Each  voxel  is  a  (tiny) 
box  and  its  volume  is  p  -  Vol(B)/(A^A^).  We  use  v  to  denote  the 
center-point  of  voxel  v  and  index  the  voxels  as  (/,  j,  t)  for  0  <  i,  j  < 
Ns  and  0  <  t  <  Nt.  With  a  slight  abuse  of  notation,  we  will  use  B  to 
denote  this  3D  grid  of  voxels.  For  a  voxel  v  e  B,  let  ip(v,S)  denote 
the  point  of  S  that  is  nearest  to  v.  We  discretize  Vor(S)  n  B  by 
assuming  that  the  entire  voxel  v  lies  in  the  Voronoi  cell  of  (p(v,S ). 
For  a  point  pt  e  S ,  we  define  its  pixelized  Voronoi  cell  as 

PVor s(Pi)  =  [v  I  y>(v,  s)  =  pi. 

The  quantity  p\  PVor^  (pH  approximates  the  volume  of  Vor s(pd- 
This  approximation  improves  as  we  increase  the  resolution,  i.e., 
lim^  ^^oo  p\  PVor5 (pi) \  -  Vol(VorS'  (pi  n  B).  PVor (S)  is  the  subdi¬ 
vision  of  B  induced  by  PVor s(pd  for  1  <  i  <  n\  see  Figure  3(a). 

As  in  [5],  we  want  to  limit  the  region  of  influence  R(pi)  of  each 
point  in  S ,  i.e.,  the  distance  of  pt  becomes  infinity  for  points  out¬ 
side  its  region  of  influence.  We  assume  R(pi  to  be  a  “cylinder”  in 
xyFspace  of  radius  r  and  height  2 r.  Namely,  a  point  z  e  R(pd  if 
d(z*,p*)  <  r  and  | tz  -  tt\  <  r.1 

1  One  could  have  defined  the  region  of  influence  to  be  the  sphere  of  radius  r 

centered  at  pi,  but  the  cylindrical  region  is  more  meaningful  in  our  applica¬ 

tion. 


We  now  define  the  truncated  pixelized  Voronoi  cell  of  pt  as 

Vor  as(Pi)  =  [v  |  (p(v,S)  =  Pi  Ave  R(pi). 

Vor D(S)  is  the  subdivision  of  B  induced  by  Vor ns(pi),  i  <  i  <  n. 
Note  that  a  voxel  not  lying  in  R(pi)  for  any  pt  e  S  does  not  belong 
to  the  truncated  Voronoi  cell  of  any  point  of  S . 

Computing  pixelized  Voronoi  diagram.  We  now  describe  a  GPU 
based  algorithm  for  computing  VorD(S),  which  is  a  generalization 
of  the  algorithm  described  in  [5]  and  uses  the  latter  as  a  subroutine. 
For  1  <  i  <  n,  let  f  :  R3  — »  R  be  the  function  that  measures  the 
distance  between  pt  and  a  point  z  in  R3,  i.e.,  f(z)  =  d(z,pi).  The 
lower  envelope  f  of  {/j,  ...,/„}  is  defined  as 

f(z)  =  min  fi(z\ 

1  <i<n 

which  is  the  distance  from  z  to  its  nearest  neighbor  in  S .  Vor(S )  is 
the  projection  of  the  graph  of  /  onto  the  xyFspace.  Following  the 
argument  in  [5,  11],  the  problem  of  computing  PVor(5),  VorD(5) 
can  be  formulated  as  rendering  a  4D  scene  on  a  3D  hyperplane. 
However  GPUs  are  able  to  render  only  3D  scenes  on  a  2D  plane, 
and  they  have  only  2D  buffers.  We  therefore  render  each  2D  slice 
of  B  (parallel  to  the  xy-plane)  separately,  as  described  below. 

Fix  a  time  slice  r  of  B,  i.e.,  the  set  of  voxels  with  time  index  r. 
Let  Flr  be  the  horizontal  plane  passing  through  the  center  points  of 
the  voxels  of  this  slice.  BT  =  B  n  nr  is  a  square  with  an  Ns  x  Ns 
2D  uniform  grid  induced  on  it.  Each  pixel  vT  of  this  2D  grid,  the 
intersection  of  a  voxel  v  with  nr,  is  a  square;  v  is  the  center  point 
of  both  v  and  vT.  For  pt  e  S ,  we  can  thus  define  its  Voronoi  cell 
slice  on  nT  as 

PVor s(Pi,r)  =  [vT  e  Br  |  v  e  PVor s(pd}, 

PVor5  (r)  is  the  subdivision  of  BT  induced  by  these  cell  slices,  which 
can  be  viewed  as  a  discretization  of  the  2D  slice  of  Vor(S )  n  Ut 
inside  B;  see  Figure  3(a,b).  Similarly  we  can  define  Vor DsQ?*,r) 
and  VorDs  (r). 

We  now  define  a  series  of  functions  fj  :  R2  — >  R,  for  1  <  i  <  n, 
as 

J7(x,y)  =  f(x,  y,  t)  =  d((x,y,r),  pi. 

Let  f(x,  y)  =  min f  Jf(x,  y)  be  their  lower  envelope.  A  pixel  vT  e 
PVor s(Pi,T)  if  and  only  if  the  function  fj.  appears  on  the  lower 
envelope  at  the  center  point  of  vT,  i.e.,  jf(vT)  =  fT(vT).  The  graph 
of  f7  is  a  2-dimensional  surface  yt.  For  example,  if  d(-,  •)  is  the 
Euclidean  distance  then 

f!(x,y)  =  {{x  -  Xif  +  (y  -  yd2  +  (r  -  f,)2)1/2 
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Figure  5.  In  the  case  of  the  Euclidian  metric,  yt  can  be  simplified  using 
the  lifting  transform. 


and  its  graph  is  one  sheet  of  a  hyperboloid  of  revolution  of  two 
sheets,  see  Figure  4(a,b).  As  in  [5],  PVor5(T)  can  be  computed 
by  viewing  BT  as  the  2D  image  plane  and  rendering  the  surfaces 
yi, . . . ,  yn  with  (0, 0,  -oo)  as  the  viewpoint.  We  set  the  color  of  yt 
to  /,  then  the  color  buffer  C[/r]  stores  the  index  of  the  point  cp(n,S), 
i.e.,  the  point  whose  Voronoi  cell  contains  /r;  see  Figure  3(c).  In 
order  to  compute  Vor °s(t)  we  need  to  truncate  yt  within  R(pi),  the 
region  of  influence  of  pt,  before  rendering  it.  By  repeating  this  for 
all  slices  0  <  t  <  Nt,  we  can  compute  VorD(5).  For  a  fixed  r,  let 
ST  =  {p  e  S  |  r  -  r  <  tp  <  r  +  r}.  Then  VorD(S,T)  =  \oyd(St,t). 
So  while  computing  Vorn(S,  r),  we  only  render  %•’ s  for  pt  e  ST. 

Triangulating  yt.  As  mentioned  in  Section  2,  the  GPU  can  ren¬ 
der  only  linear  objects.  Therefore  we  need  to  approximate  each  yt 
(or  its  truncated  version)  by  a  triangulated  surface,  yf.  Here  we 
describe  the  triangulation  procedure  for  the  case  when  d(-,  •)  is  a 
convex  function.  For  technical  reasons,  we  also  assume  that  d(-,  •) 
does  not  have  a  cross  term  between  t  and  x,y,  i.e.,  we  can  sep¬ 
arate  spatial  and  temporal  terms.  This  implies  that  d(-,  •)  can  be 
restricted  to  define  the  distance  between  two  points  in  R2.  We  as¬ 
sume  that  y/  is  such  that  | tt  -  r\  <  r.  Next,  we  clip  yt  within  points 
n  e  R2  s.t.  d{n,  p* )  <  r.  By  our  assumption  that  there  are  no  cross 
terms,  all  points  on  the  boundary  of  the  clipped  surface  yt  have  the 
same  z-value.  Let  yt  also  denote  the  resulting  surface  patch.  Next, 
we  choose  a  parameter  m  and  compute  m  level-set  curves  (contour 
lines)  at  regular  height  intervals  on  yf,  i.e.,  intersect  yf  with  m  hor¬ 
izontal  planes  (e.g.  dashed  circles  in  Figure  4(c).  Each  curve  is 
a  convex  closed  curve,  which  we  approximate  by  a  convex  &-gon, 
for  some  parameter  k,  by  choosing  k  points  on  the  curve.  We  then 
create  triangles  between  adjacent  k-gons  using  2 k  triangles.  The 
resulting  triangulated  surface  y?,  composed  of  2 km  triangles,  ap¬ 
proximates  7/.  See  Figure  4(c).  the  approximation  error  can  be 
controlled  by  choosing  the  values  of  m  and  k  carefully. 

The  case  of  Euclidean  distance.  Finally,  we  show  that  if  d(-,  •)  is 
a  weighted  Euclidean  distance  function  as  defined  in  (3),  there  is 
a  much  better  way  of  rendering  yf  using  the  so-called  lifting  trans¬ 
form  [6],  which  uses  fewer  triangles  and  induces  no  error  in  the 
distance  function  (though  the  region  of  influence  is  still  approxi¬ 
mated) 

For  1  <  i  <n,  define  gj  :  R2  — >  R  as 

gj(x,y)  =  ff(x,yf-x2-y2 

=  -2 xxi  -  2yy ,■  +  xj  +  y2  +  or(f,  -  r)2. 

Let 

gT  (-«,}’)  =  min  gj(x,y). 

I 


The  crucial  observation  is  that 

gj(x,y)  <  g)(x,y)  «■  fl(x,y )  <  fj(x,y), 

which  implies  that  the  xy-projection  of  the  lower  envelope  of  / 
and  g  are  identical,  therefore  PVor5(r)  is  also  the  projection  of  the 
graph  of  gr(v,  y).  We  therefore  render  the  graphs  of  gj  instead  of  jf. 
The  graph  ht  of  gT  is  a  2-dimensional  plane,  so  it  can  be  rendered 
directly.  In  order  to  compute  VorD(S),  we  need  to  truncate  ht  within 
the  region  of  influence  of  hi.  We  first  assume  that  \tt  -  r|  <  r.  Next, 
let  hi  be  the  portion  of  ht  clipped  within  the  region  of  influence  of 
Pu 

hi  =  {(x,y,z)  e  hi  |  (x  -  X/)2  +  (y-  yt)2  <  r2}, 

which  is  an  ellipse  in  3D.  We  wish  to  render  hi, . . . ,  hn.  We  approx¬ 
imate  hi  to  a  convex  &-gon  h ?  as  follows.  Let  cr  be  a  regular  ^-gon 
in  R2.  We  set  cr;  =  cr  +  p*.  h\  is  obtained  by  lifting  cr;  to  hh  i.e., 

h°  =  {( x,y,gj(x,y ))  |  (x,y)  e  cr,). 

For  each  vertex  v  of  cr/  there  is  a  vertex  (v,  gj)  in  hj.  We  triangu¬ 
late  h°  into  k  triangles  in  a  standard  manner.  Note  that  h ?  encodes 
the  function  gj  exactly;  it  only  approximates  the  region  of  influ¬ 
ence. 

The  GVor  algorithm.  Let  GVor(5,t)  denote  the  algorithm,  de¬ 
scribed  in  this  section,  to  compute  the  slice  of  the  truncated  pixe- 
lated  Voronoi  diagram  at  time  r,  i.e.,  Vorn(S,T)  which  is  the  same 
as  VorD(ST,T).  The  output  of  this  algorithm  is  the  combination  of 
color  buffer  C  and  depth  buffer  D,  which  together  describe  VorD(S ,  r). 
However,  we  apply  the  following  twist:  we  set  the  color  of  each 
triangle  of  y*  to  h(pt)  (instead  of  i).  Thus  C[tt]  stores  h(pt)  for 
all  pixels  n  e  Vor ns(pi,r).  We  define  the  frame  buffer  F  to  be 
(C,  D)  and  we  can  think  of  F  as  the  result  of  running  GVor,  i.e., 

F  =  GVor(S,t).  Ignoring  initialization  costs,  the  complexity  of 
GVor(S,t)  is  that  of  rendering  y\  (or  h\)  for  each  pL  within  the 
region  of  influence. 

4  Natural  Neighbor  Interpolation  on  Grids 

In  this  section  we  describe  our  algorithms  for  answering  natural- 
neighbor  interpolation  queries  on  a  M  X  M  X  T  grid  Q  of  points 
in  R3  using  truncated  pixelized  Voronoi  diagrams.  We  assume  that 
Q  has  origin  (0, 0, 0)  and  resolution  1  in  each  dimension,  i.e.,  Q  = 
{0, . . . ,  M  -  l}2  x  {0, . . . ,  T  -  1}.  Any  3D  grid  can  be  mapped  to 
Q  using  an  affine  transformation  and  using  the  weighted  Euclidian 
metric  to  preserve  the  structure  of  the  Voronoi  diagram. 

Since  the  region  of  influence  has  radius  r  and  height  2 r,  we  de¬ 
fine  B  =  Q  +  [-2 r  -  1/2, 2r  +  1  /2]3  —  large  enough  to  contain  all 
the  points  of  S  that  are  of  interest,  and  we  can  assume  that  S  c  B. 
For  simplicity  we  assume  that  re  N.  Let  s  e  N  be  a  parameter.  We 
discretize  B  into  a  sM  x  sM  x  T  3D  grid  of  voxels,  each  voxel  is 
a  box  of  size  j  x  |  and  height  1,  and  its  volume  is  p  -  By  our 
definition  of  B  and  the  voxels,  each  grid  point  of  Q  lies  at  the  center 
of  an  s  x  s  x  1  array  of  voxels  of  B,  i.e.,  the  temporal  resolution  of  B 
and  Q  is  the  same,  but  the  spatial  resolution  of  B  is  5  times  that  of 
Q.  One  can  choose  the  temporal  resolution  of  B  higher  than  that  of 
Q  but  it  is  not  important  in  our  applications  and  choosing  the  two 
to  be  the  same  simplifies  the  description  of  the  algorithm. 

Recall  the  definition  of  a  the  natural  neighbor  interpolation  h  : 

5  — >  R3  given  in  Equations  (1)  and  (2).  In  our  discretized  setting 
we  modify  h  as  follows: 

...  ZP6S  |VorDs(p)n  VorDsu{9](4)|/i(p)  N(q) 

Mq)‘ - iwwdi - ‘m  m 
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Figure  4.  (a,b)  Lower  envelope  of  8  points  that  were  used  for  PVorGS)  in  Figure  3(b,c).  (a)  shows  the  hyperboloids  from  a  side  view;  (b)  the  outer  cells 
of  the  Voronoi  diagram  are  infinite,  but  in  this  figure  their  sizes  are  limited  because  the  hyperboloids  are  of  a  limited  height,  (c)  The  triangulation  of  y; 
into  y?  using  two  triangle  strips  (m  =  2). 


Algorithm 

#  Copies 

Triangles  rendered  per  p  e  S 

#  of  passes 

ofF 

General 

Weighted  Euclidean 

~~rr~ 

1 

0{rzkmri) 

0(rzn) 

r 

2r+  1 

Oirkmri) 

0(rri) 

1 

2(2r  +  1) 

0(kmn) 

0(n ) 

Table  1.  Performance  of  the  three  algorithms.  \S\  =  n,  each  copy  of  F 
requires  storing  a  color  buffer  and  associated  depth  buffer  on  the  GPU. 


D(q)  is  simply  the  number  of  voxels  in  Vor^u^g).  N(q)  is  the 
sum  of  the  heights  represented  by  each  voxel  in  Vor a(S)  from  the 
Voronoi  cell  of  q. 

We  observe  that  the  region  of  influence  along  the  time  axis  im¬ 
plies  that  only  voxels  v  for  which  \tv  -  tq\  <  r  are  relevant  for  N(q) 
and  D(q).  Thus,  we  only  need  to  consider  2r  +  1  slices  of  B  corre¬ 
sponding  to  II tq_r, . . . ,  nt  +r.  Applying  this  to  (4)  we  get 


tq+r 

°(q)=  Y  I  VorDsu{<,](?,T)| 

T=tq—r 

tq+r 

N(q)  =  Z  Z  h(<p(7T,S)) 

T=tq-r  n&\oxa Su{q}(q,T) 


tq+r 

=  Y  D(q,r ) 

T=tq~r 

tq+r 

=  Y  N(q,T), 

T=tq-r 


where 


N(q,T)=  Y  h(ip(7:,S)), 

ne\ornSu{q](q,T) 

D(q,  r)  =  |VornSU(?)(4,T)|. 


(5) 

(6) 


We  refer  to  Table  1  for  an  overview  of  the  algorithms  presented  in 
this  section. 

Let  H  be  an  M  x  M  x  T  array  that  stores  the  height  at  all  points 
of  Q,  computed  using  the  discretized  natural-neighbor  interpolation 
defined  in  (4).  For  a  fixed  time  0  <  i  <  T,  let  Q ,•  denote  the  query 
points  in  time  slice  i  and  let  Ht  denote  the  corresponding  time  slice 
of  H.  For  a  fixed  /,  by  (5)  and  (6),  the  computation  of  Ht  involves 
computing  VorD(S l~r ,  i -  r), . . . ,  Vorn(5 l+r ,  i  +  r).  Since  we  compute 
each  HT  separately,  a  point  of  S  has  to  be  processed  by  the  GPU 
several  times.  We  describe  three  algorithms  that  provide  tradeoffs 
between  the  number  of  times  each  point  of  S  is  processed  (which 
we  refer  to  as  the  number  of  passes )  and  the  space  (on  the  GPU) 
used  by  the  algorithm. 

r2-pass  algorithm..  We  process  each  Qf,  for  0  <  i  <  T,  sepa¬ 
rately.  Equations  (5)  and  (6)  suggest  that  we  compute  N(q ,  r)  and 
D(q ,  r)  for  r  =  i  -  r, ...,/  +  r  and  for  all  q  e  Qf.  For  each  r 


we  first  compute  FT  =  GVor(S,t),  giving  us  a  representation  of 
Vorn(S,T)  =  VorD(S'T,T).  For  q  e  Qf,  let  yq  denote  the  graph  of 
the  function  J((x,y,T),  q),  which  is  a  2D  surface,  and  let  y^  be  the 
linear  approximation  of  yf,  truncated  within  R(q).  Beutel  et  al  [5] 
have  described  an  algorithm  that  uses  Fr  and  renders  the  surfaces 
{yq\  q  e  Q ;}  so  that  the  color  buffer  Cf>T  encodes  VorD(5  U  {q},  r)  for 
all  q  e  Qj.  Then  using  Fr  and  QfT  it  computes  N(q,  r)  and  D(q,  r). 
We  call  their  algoirthm  Interpolate(Ft,  i,  r).  Depending  on  the  size 
of  M,  r,  ^  and  GPU  constraints,  this  may  require  several  rendering 
passes  and  an  I/O-efficient  partitioning  algorithm;  we  refer  to  [5] 
for  details.  We  assume  that  the  Interpolate  procedure  returns  two 
MxM  arrays  JVv  and  if  s.t.  A'v[cy]  =  N(q,r)  and  Dv[q]  =  D{q, r). 
The  pseudo-code  of  the  algorithm  is  described  in  Algorithm  1 . 

Each  point  p  e  S  is  passed  to  the  computation  of  0(r )  slices  of 
VorD(ST,  r)  for  r  e  {[tp\  -r, . . . ,  ltp~\  +  r}.  Furthermore  each  Voronoi 
diagram  slice  Vor n(S,  r)  is  computed  2r+l  times  for  QT_r, . . . ,  Qr+r. 
Hence  each  point  of  S  is  passed  0(r2)  times. 


Algorithm  1  r2-pass 
for  i  <—  0  to  T  -  1  do 
Hi  +-  Ni  <-  A  <-  0 
for  r  e  {/  -  r, . . .  i  +  r]  do 
F  <—  GVor(5t,t) 

(Vv,  Dv)  =  Interpolate(F,  /,  r) 
Ni  <r-  Ni  +  N\  Di  4r-  Di  +  Dv 
Hi  <r-  Ni/Di 
return  H 


r-pass  algorithm.  The  overall  structure  of  the  algorithm  is  the 
same  as  before  but  we  save  some  computation  by  storing  what  we 
have  already  computed  and  re-using  it.  As  before,  let  frame  buffer 
Fr  =  GVor (S,  t)  refer  to  the  combination  of  CT  and  Ot  describing 
Vorn(5,r).  To  answer  NNI  queries  for  points  in  Qt  we  must  gen¬ 
erate  F;_r, . . .  F|+r  and  perform  the  interpolation  on  each  time-slice. 
When  we  proceed  to  Q;+1 ,  we  must  now  generate  F/+1_r, . . .  Fi+1+r. 
We  have  all  of  the  frame  buffers  except  the  last,  F/+i+r,  from  the  pre¬ 
vious  step  and  thus  they  can  be  reused  by  Interpolate  step.  We  only 
need  to  call  GVor (S *+1+r,  i+l+r).  See  Algorithm  2  for  the  pseudo¬ 
code.  This  algorithm  requires  storing  2r  +  1  frame  buffers:  2 r  to 
save  values  from  the  previous  query  grid  and  one  to  generate  the 
new  Voronoi  diagram.  Doing  this  for  Q;  for  all  0  <  i  <  M  in  order 
lets  us  generate  each  F*  only  once.  This  method  renders  each  point 
p  0(r)  times:  once  for  each  Vorn(5T,r)  forr  e  {[tp}-r, . . . ,  \tp~\+r}. 
1-pass  algorithm.  We  now  show  that  the  number  of  passes  can 
be  reduced  to  1  if  S  is  time  series  data  and  d(-,  •)  satisfies  certain 


69 


Algorithm  2  r-pass 
Create  F_r_i . . .  Fr_i 
for  i  <—  0  to  T  -  1  do 
Hi  <-  Ni  <-  A  <-  0 
Delete  F/_r_i  from  GPU 
F/+r  <—  GVoR(5i+r,  i  +  r) 
for  r  e  {/  -  r, . . .  i  +  r}  do 

(Av,  Dv)  =  Interpolate(Ft,  i,  t) 
A/  <-  A  +  Av,  Di  <-  Di  +  Dv 

A  +-  A-/A 

return  H 


properties  and  the  GPU  supports  certain  primitives.  For  simplicity, 
we  assume  that  S  =  E1  U  £2  U  •  •  • ,  where  the  f -coordinate  of  all 
points  in  Y!  is  i,  the  time  coordinate  of  the  ith  slice  of  Q,  and  d{ •,  •)  is 
the  weighted  Euclidian  distance.  The  first  assumption  can  easily  be 
removed.  The  second  assumption  can  be  relaxed  to  a  larger  family 
of  distance  functions,  but  identifying  this  family  is  technical  and 
omitted  from  this  version.  Recall  that  Vor(Ez,  j)  is  the  projection  of 
the  lower  envelope  y)  of  the  functions 

g‘j,p(x,  y)  =  -2 xxp  -  2yyp  +  x2p+y2p  +  oj2(j  -  i )2  Vp  e  l!  (7) 

Since  the  last  term  in  (7)  does  not  depend  on  p ,  it  immediately 
implies  that  Vor(£z,  j)  is  the  same  for  all  j  and  that 

giJ(x,y)  =  gii(x,y)  +  aj2(j-i)2. 

Let  F*.  =  (Cl,  Bp  =  GVor (Y\j)  —  C.[7r]  stores  the  height  of  the 
point  nearest  to  n  and  B 1  stores  the  distance  from  n  to  its  nearest 
neighbor,  and  let  J1  -  F).  Then  FI  can  be  computed  from  fF  by 
adding  the  value  oj2(J  -  i)2  to  every  pixel  in  Bz.  Let  Shift(F,  8)  be 
the  procedure  that  adds  the  value  8  to  all  pixels  in  D.  Then 

¥‘j  =  SmFT(3\  cj2(J  -  i)2). 

Recall  that  SJ  =  {Jr(=_r  "Li+e.  Assuming  we  have  O'1  for  j  -  r  <  t  < 
j  +  r,  we  can  compute  the  frame  buffer  F j  =  (C j,  O  j)  correspond¬ 
ing  to  Vor D(Sj,  j ),  using  the  procedure  CoMBiNE(T7-r, . . . ,  T7+r)  as 
follows.  We  first  compute  F^  for  j  -  r  <  £  <  j  +  r,  and  set 

07-[7r]  =  min 

j—r<{< j+r  J 

and  finally  set  Cj[n]  to  the  contents  of  C^n]  if  B j[n]  =  ]D^ [tt] .  The 
pseudo-code  can  be  found  in  Algorithm  3. 


Algorithm  3  Combine(T7  r, . . . ,  T7+r) 
for  all  n  e  F7  do 

Cj[n]  <—  0,  B;-[7r]  <—  oo 
for  i  e  j  -  r, . . . ,  j  +  r  do 
F*.  <—  Shift(5F,  cj2{j  -  i)2) 
for  all  n  e  F7  do 

if  Wj[7r]  <  Bfi7r]  then 
Dj[n]  Bz  [tt] 

Cj[7T]  4-  CjlTT] 
return  F j 


With  these  procedures  at  hand,  we  modify  the  r-pass  algorithm 
as  follows,  so  that  each  point  is  processed  once.  See  Algorithm  4 
for  the  pseudo-code.  The  algorithm  maintains  the  invariant  that 
while  processing  Qf,  it  maintains  F/_r, . . . ,  F;+r  and  also  . . . ,  Jl+2r . 
While  processing  of  Q /,  the  algorithm  first  computes  F+2r  using 
GVoR(Ez+2r,  i  +  2 r).  Next,  instead  of  computing  ¥i+r  by  invoking 
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! 
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Figure  6.  The  process  of  creating  F*  from  sets  of  data  V  rendering  each 
point  only  once. 


GVor (Sl+r,  i  +  r),  it  uses  the  procedure  Combine(5P,  . . . ,  <Jl+2r).  The 
rest  of  the  procedure  is  the  same.  Note  that  each  point  is  processed 
only  once. 


Algorithm  4  1-pass 

Create  J~l . . .  (3'2r~l  and  F_r_i . . .  Fr_i 
for  i  <—  0  to  T  -  1  do 

H(Qi)  <-  N(Qi)  <-  D(Qi)  <-  0 
Delete  T~l  and  F/_r_i  from  GPU 
T+2r  <r-  GVoR(E/+2r,  i  +  2 r) 

F;+r  <—  CoMBINE(T/  .  .  .  3>2r) 
for  r  e  {i  -  r, . . .  i  +  r]  do 

(Vv,  Dv)  =  Interpolate(Ft,  i,  r) 
Ni  <r-  Ni  +  N\  Di  <r-  Di  +  Dv 

Hi  <r-  NJDi 

return  H 


5  Experimental  Results 

We  now  discuss  the  details  of  the  implementation  and  testing  of 
TerraNNI.  It  is  implemented  in  C++  and  use  OpenGL  for  rendering 
on  the  GPU.  We  ran  our  experiments  on  an  Intel  Core2  Duo  CPU 
E6850  at  3.00GHz  with  8GB  of  internal  memory.  We  used  Ubuntu 
10.4  and  two  1TB  SATA  disk  drives  in  a  RAIDO  configuration. 
Additionally,  the  machine  contained  a  NVIDIA  GeForce  GTX  470 
graphics  card  running  CUDA  3.0.  This  card  has  1.2GB  of  memory, 
448  CUDA  cores,  and  14  multiprocessors. 

Data  sets.  We  ran  TerraNNI  on  two  data  sets.  We  performed  a 
majority  of  our  tests  on  LiDAR  data  collected  from  the  Nags  Head, 
NC  region  for  years  1997-1999,  2001,  2004,  2005,  2007,  and  2008. 
The  data  ranges  from  100,000  points  per  year  to  nearly  3  million 
points  per  year  and  comes  to  a  total  of  just  under  20  million  points. 
This  data  set  takes  up  382  megabytes  on  disk.  We  will  refer  to  this 
data  set  as  the  NC  coastal  data  set.  The  data  is  freely  available  at 
[1]  and  was  provided  to  us  by  Helena  Mitasova. 

We  also  performed  tests  on  a  much  larger  LiDAR  data  set  col¬ 
lected  from  Fort  Leonard  Wood  in  Missouri  (data  courtesy  of  the 
U.S.  Army  Corps  of  Engineers)  in  2009  and  2010.  The  data  set 
consists  of  approximately  450  million  data  points  over  an  80.5km2 
region  and  takes  up  8.8GB  on  disk. 
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Algorithm 

P 

r 

Shape  Rendered 

Hyperboloid 

Plane 

Hyperboloid 

Plane 

Binning  Time 

17.89s 

17.53s 

13.98s 

17.2s 

GVor(S) 

8m  15s 

3m  16s 

lm  24s 

35.8s 

Draw  Query  Cones 

4m  Is 

2m  26s 

3m  44s 

2m  28s 

Interpolate 

3m  45s 

4m  18s 

2m  45s 

3m  5s 

Write  points 

5.2s 

4.54s 

3.47s 

4.53s 

Total  running  time 

16m  26s 

10m  23s 

8m  11s 

6m  32s 

Table  2.  Comparison  of  running  times  of  different  variants  of  our  algorithm  on  the  NC  coastal  data  set. 


Parameter  choices.  Within  our  experiments  there  are  numerous 
parameters  that  can  be  adjusted  to  tune  our  TerraNNI’s  speed  and 
quality  trade-offs.  Many  of  these  parameters  stem  from  the  2D  al¬ 
gorithm  [5]  and  we  refer  to  that  paper  for  further  details.  We  use  a 
weighted  Euclidean  metric  as  the  distance  function  for  all  our  tests. 
The  speed  of  the  algorithm  is  highly  dependent  on  the  number  of 
triangles  rendered.  In  rendering  the  weighted  Euclidean  metric  we 
tested  rendering  hyperboloids  yt  as  well  as  planes  hi.  In  rendering 
each  hyperboloid  we  set  k  =  50  and  m  =  5  such  that  each  y*  was 
composed  of  500  triangles.  As  discussed  in  Section  3,  when  work¬ 
ing  with  the  Euclidean  metric  we  can  apply  the  lifting  transform  to 
render  planes  rather  than  hyperboloids.  When  rendering  planes,  we 
clip  the  plane  ht  to  an  ellipse  ht,  we  approximate  it  by  a  rectangle 
h ?  circumscribing  ht,  and  partition  h ?  into  two  triangles. 

Efficiency.  The  largest  data  set  in  our  tests  was  the  Ford  Leonard 
Wood  data  set  which  covers  two  years.  Here  we  created  a  3D  grid 
with  a  spatial  resolution  of  5m  (a  total  of  4  million  query  points) 
and  positioned  in  time  halfway  between  the  2009  and  2010  surveys. 
This  run  took  26  minutes  to  complete. 

The  NC  coastal  data  set  is  interesting  due  to  the  larger  temporal 
range  of  the  data.  We  used  this  data  set  to  compare  various  trade¬ 
offs  in  the  r-pass  and  r2-pass  algorithms.  We  ran  different  variants 
of  TerraNNI,  every  time  producing  a  3D  grid  with  at  a  spatial  1 
meter  resolution  over  a  2.3  x  1.8km2  region.  The  3D  grid  has  12 
slices  in  time,  corresponding  to  producing  an  output  every  half  year 
for  6  years  in  total.  This  3D  grid  has  a  total  of  about  48  million 
voxels.  The  results  can  be  found  in  Table  2.  The  table  shows  how 
much  time  TerraNNI  spends  in  each  phase,  and  include  timings  for 
the  variant  using  the  lifting  transform  where  we  only  have  to  render 
a  plane  (two  triangles)  per  input  and  query  point.  As  can  be  seen  in 
the  table,  the  r-pass  algorithm  takes  6  minutes  and  32  seconds  using 
planes,  approximately  60%  of  the  time  of  the  r2-pass  algorithm,  and 
takes  8  minutes  and  1 1  seconds  when  using  hyperboloids,  50%  of 
the  time  of  the  r2-pass  algorithm.  We  save  between  40%  and  55% 
of  the  rendering  time  by  using  planes  rather  than  hyperboloids  since 
the  planes  consist  of  significantly  less  triangles. 

As  indicated  by  Table  2  the  Interpolate  procedure  takes  longer 
for  planes  than  hyperboloids.  As  described  above,  the  way  we  clip 
and  approximate  hyperboloids  and  planes  yt  and  hL  to  obtain  y\ 
and  h%  the  xy-projection  of  the  latter  covers  a  larger  area  in  the 
xy-plane,  and  thus  cover  more  pixels.  Consequently  Interpolate 
performs  additional  atomic  operations  in  this  case  (see  [5]  for  de¬ 
tails).  By  approximating  ht  with  a  polygon  h ?  with  more  vertices 
(k)  and  thus  increasing  the  number  of  triangles  into  which  h*  is  de¬ 
composed,  one  can  achieve  a  trade-off  between  the  time  spent  by 
the  GVor  and  Interpolate  procedures  and  choose  a  value  of  k  that 
minimizes  the  overall  running  time. 

We  also  compared  our  implementation  to  traditional  (exact)  CPU 
algorithms.  Note  that,  in  addition  to  being  confined  to  the  CPU, 
these  algorithms  also  attemp  to  robustly  and  exactly  compute  the 
Voronoi  diagrams.  We  used  the  NC  Coastal  data  set  as  a  starting 


point  for  these  tests  but  also  tried  with  smaller  synthetic  data  sets. 
Common  for  all  the  algorithms  below  is  that  they  have  problems 
storing  the  entire  Voronoi  diagram  in  memory,  and  in  practice  this 
means  that  the  available  system  memory  decides  how  large  data 
sets  they  can  handle.  Furthormore,  the  NC  Coastal  data  set  is  a  time 
series,  i.e.,  the  points  have  one  of  8  different  time  values.  Thus  there 
are  millions  of  coplanar  points  which  causes  robustness  issues  for 
exact  algorithms  and  we  often  had  to  randomly  perturb  the  points 
in  the  time  axis  to  work  around  this  issue.  Spatially,  the  points  tend 
to  cover  the  same  area  as  well,  leading  to  further  robustness  issues. 

The  Interpolate3d  [10]  package  is  a  freely  available  3D  NNI  im¬ 
plementations  for  performing  NNI  of  3D  vector  fields  on  the  CPU. 
Unfortunately  it  does  not  run  on  the  full  NC  Coastal  data  set,  ex¬ 
hausting  the  8GB  of  system  memory  after  2  hours,  about  half¬ 
way  during  the  construction  of  the  Voronoi  diagram  of  the  input 
points.  It  also  runs  out  of  memory  interpolating  15  million  points 
(the  NC  Coastal  data  set  has  20  million  points).  But  it  is  able  to 
complete  with  10  million  input  points,  and  interpolating  at  50,000 
query  points  in  about  three  hours,  with  the  memory  usage  peaking 
at  7.8GB.  Compare  this  to  the  r-pass  algorithm  using  planes  that  in¬ 
terpolates  at  48  million  query  points  using  the  full  20  million  point 
set  in  less  than  7  minutes.  The  Interpolate3d  algorithm  had  issues 
with  the  coplanar  points  of  the  original  data  set  and  we  perturbed 
the  points  in  time  to  complete  the  test. 

We  also  compared  TerraNNI  against  state  of  the  art  CPU-based 
algorithms  for  computing  the  Delaunay  triangulation,  the  dual  of 
the  Voronoi  diagram.  This  test  is  relevant  since  this  computation 
is  a  fundamental  part  of  computing  NNI.  We  used  the  algorithm 
from  the  Cgal  [2]  library  as  well  as  the  one  available  in  Qhull  [4]. 
Cgal  was  able  to  successfully  compute  the  Delaunay  triangulation 
of  the  NC  coastal  data  set,  using  took  1 1  minutes  and  with  a  peak 
memory  use  of  just  under  7GB.  Impressively,  it  ran  on  the  origi¬ 
nal  unperturbed  data  set.  However,  this  is  significantly  slower  than 
TerraNNI  which  takes  six  and  a  half  minutes  to  do  the  full  interpo¬ 
lation  and  only  35.8  seconds  of  this  time  is  spent  on  constructing 
the  pixelized  truncated  Voronoi  diagram.  Additionally  the  Cgal  al¬ 
gorithm  is  already  hitting  the  limit  on  our  system  memory  and  was 
unable  to  successfully  triangulate  the  full  Fort  Leonard  Wood  data 
set. 

Qhull  [4]  offers  a  variety  of  strategies  for  dealing  with  degen¬ 
eracies,  we  choose  to  perturb  the  input  points  along  the  time  axis. 
Qhull  runs  out  of  memory  after  a  couple  of  minutes  on  the  full  data 
set,  the  largest  data  set  we  were  able  to  handle  with  a  8GB  of  sys¬ 
tem  memory  was  3  million  points  which  took  3  and  a  half  minutes 
with  a  peak  memory  use  of  6GB.  With  4  milion  points  the  mem¬ 
ory  used  exceeded  the  8GB  of  system  memory  and  the  computation 
time  increased  dramatically  to  one  and  a  half  hour. 

6  Application  to  Change  Analysis 

In  this  section  we  describe  an  application  of  TerraNNI  to  geomor- 
phologic  analysis  of  multi-temporal  NC  coastal  LiDAR  data. 
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(a)  Regression  slope  values  for  cmin  =  0 


(b)  Regression  slope  values  for  cmin  =  0.85 
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Figure  7.  Regression  line  slope  values  from  year  2000  to  2006  superimposed  on  year  2000  DEM. 


Figure  1(c)  gives  a  map  of  Nags  Head,  NC.  The  focus  of  our  ap¬ 
plication  study  is  Jockey’s  Ridge  state  park  sand  dune.  Specifically, 
we  investigate  the  annual  sand  dune  growth  and  erosion  from  years 
2000  to  2006.  In  order  to  perform  the  study,  first  we  apply  Ter- 
raNNI  to  interpolate  a  digital  elevation  model  (DEM)  for  each  year 
from  2000  to  2006,  inclusively,  since  many  of  the  yearly  collects 
(e.g.,  2000,  2002,  and  2003)  are  not  available.  We  set  the  radius 
of  influence  to  2.  Second,  we  estimate  a  linear  regression  model 
for  each  pixel  and  all  of  its  temporal  neighbors  (i.e.,  years  2000  to 
2006).  For  example,  if  p(x,y,t)  is  the  pixel  value  (height)  at  lo¬ 
cation  ( x ,  y)  and  time  t,  a  linear  regression  model  is  estimated  for 
the  set  { p(x,y ,  t\), . . . ,  p(x,y ,  tn)}.  Third,  we  apply  the  change  anal¬ 
ysis  method  of  [16]  by  examining  the  rates  of  change  (i.e.,  slopes) 
of  the  model  estimates  for  pixel  observations  that  exhibit  strong 
linear  trends.  As  in  [16],  we  define  the  notion  of  a  coefficient  of 
determination  x :  a  measure  of  the  reliability  of  the  linear  model 
in  predicting  the  pixel  values  (response  variable),^  e  [0, 1]  is  ex¬ 
pressed  as  follows:  x  —  1  “  (SScrr/SStot)  where  SSerr  is  the  sum  of 
squared  errors  between  the  linear  model’s  estimates  and  response 
variable  and  SStot  is  the  total  sum  of  squared  deviations  of  the  re¬ 
sponse  variable  to  its  expectation  [7].  A  high^  implies  that  the 
SScrr  is  relatively  small  compared  to  SSt ot,  indicating  that  the  linear 
model  can  well  predict  the  sample  observations.  We  set  a  threshold 
value  cmin  and  choose  those  pixels  for  which  x  >  cmin.  The  thresh¬ 
old  cmin  can  be  regarded  as  the  minimum  proportion  of  the  variation 
in  the  response  variable  that  is  captured  by  the  linear  model. 

Figure  7(a)  depicts  the  slope  values  for  the  entire  sand  dune  re¬ 
gion  for  Cmin  =  0.  Strong  erosions  (negative  slopes)  are  observed  in 
the  northwest  region  with  extensive  accumulation  (positive  slopes) 
in  the  southeast  area  of  the  dune.  This  erosion  and  accumulation 
suggest  a  net  wind  force  pattern  movement  towards  the  southeast 
direction;  however,  these  observed  slope  values  do  not  convey  ad¬ 
ditional  information  on  the  nature  of  the  movements  throughout 
the  years.  But  by  using  the  linear  regression  estimates,  further  in¬ 
sights  into  the  inter-annual  patterns  can  be  harnessed  by  extracting 
those  composite  movements  that  exhibit  a  linear  trend.  We  can  de¬ 
termine  these  linearly  dependent  patterns  by  pruning  those  DEM 
pixels  that  have  large  deviations  from  the  regression  model,  which 
is  achieved  by  increasing  Cmin.  Figure  7(b)  shows  the  slope  values 
for  cmin  =  0.85.  In  this  figure,  we  can  see  regions  of  the  dune  which 
have  undergone  continuous  changes  following  a  predominantly  lin¬ 
ear  trend.  We  exemplify  these  changes  for  year  2002  to  2004  in 
Figure  8.  From  2002  to  2003,  the  measured  total  sand  growth  and 


erosion  are  20512  m3  and  82714  m3,  respectively.  For  years  2003 
to  2004,  similar  level  of  dune  activity  is  observed  with  total  growth 
and  erosion  volumes  of  33895  m3  and  74809  m3,  respectively.  Be¬ 
cause  the  proposed  NNI  approach  can  efficiently  interpolate  the 
missing  DEMs  in  both  space  and  time,  geomorphologic  analyses 
such  as  the  one  demonstrated  in  this  case  study  can  be  rapidly  and 
effectively  executed  on  large  areas  and  at  high  resolutions. 

7  Conclusion 

In  this  paper  we  have  presented  three  algorithms  for  performing 
(discretized)  NNI  on  a  3D  grid  using  a  GPU.  The  three  algorithms 
provide  different  tradeoffs  between  GPU  computational  complexity 
and  GPU  memory  requirement  ranging  from  storing  one  buffer  on 
the  GPU  and  processing  each  input  point  a  quadratic  (in  r )  number 
of  times,  to  processing  each  input  point  one  time,  but  with  memory 
use  linear  in  the  time  region  of  influence.  Furthermore,  we  used 
the  lifting  transform  to  limit  the  polygonal  complexity  of  the  sur¬ 
faces  used  for  each  point,  shifting  the  bottleneck  away  from  ren¬ 
dering  triangles.  Our  algorithm,  and  its  implementation,  scale  to 
very  large  data  sets  and  the  underlying  discretization  works  around 
the  robustness  problems  that  add  additional  complexity  to  existing 
CPU-based  algorithms.  We  are  unaware  of  any  robust  implementa¬ 
tion  of  NNI  for  points  in  R3  that  can  handle  large  data  sets. 

Our  experimental  results  gives  an  example  of  an  application  of 
our  algorithm  and  suggests  that  is  practical  in  real  world  situations, 
outperforming  exact  CPU  algorithms  on  medium  sized  data  sets 
and  enabling  the  efficient  processing  on  even  larger  data  sets  with 
limited  memory  requirements.  We  plan  on  making  our  implemen¬ 
tation  of  TerraNNI  publicly  available. 
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